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Abstract 

We present some explorations in the analysis of six stop consonants /k, p, t, b, d, g/ us- 
ing wavelet transform domain information in then acoustic manifestation Thiee diffeient 
strategies are tried, they are analysis via classification, waveletogram and reconstruction 
from modulus maxima The main difficulties of the stop consonant problem he in the 
nonstationary and nonlinear statistical stiuctuie of the acoustic signal in the burst and 
transition regions Nonstationarity renders the application of the Fourier Tiansfoim(FT) 
methods questionable Wavelet Transform has demonstrated good-time frequency localiza- 
tion properties and is therefore appropiiate tool for the analysis of non-stationary signals 
like speech Moreover, unlike LPC and HMM modeling, we do not assume here any model 
for input speech The Discrete Wavelet Transfoim(DWT) may also be implemented as fast, 
pyramidal algorithm The analysis via classification produces 83% correct classification for 
unvoiced stop consonants /k, p, t/ We wcie looking foi soiiu^ explicit time information 
like voice onset time , place of occurrence of burst etc from the waveletogram but it fails to 
give explicit result Mallat’s algoiitlirn foi signal leconstiuction horn the value of modulus 
maxima and their positions is successfully implemented in an effort to characterize the 
signal m terms of these features 
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Chapter 1 


Introduction 


1.1 Review 

Speech is perhaps the most natural and fastest mode of not only human communica- 
tion but also man-machine communication Successful developement of automatic speech 
recognition and synthesis systems for man-machine communication interface , is language 
dependent Different languages are characterized by different types and number of basic 
speech units , such as phoneme , different rules of pronunciation and joining of phonemes 
etc English language has 42 phonemes Out of that six stop consonants /k,p t,bjd,g/ 
present the greatest diffuculty m recognition tasks An efficient compact characterization 
of these stop consonants from their acoustic manifestation is therefore crucial step in the 
developement effort for speech recognition and synthesis systems This is the major ob- 
jective of the present exploration 

The methodology for achieving the above mentioned objective is to analyze the acoustic 
signal corresponding to the utterances of stop consonants using appropriate signal process- 
ing techniques and in particular the so called ^''Wavelet Transform” The rationale for this 
approach is elaborated in the following discussion 

Throughout the history of digital speech processing , which is almost four decades old at 
present , the effort has been on to find accurate representations of speech signals that can 
be used for speech coding , enhancement , recognition and synthesis Drawing analogy 
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from the uniformity (whithin limits ) of processing units in the human brain , and be- 
ing driven by implementational considerations , researchers have tried to seek out flexible 
structures that comply with the vast bodies of loosely— related empirical results Following 
rapid advances in the science of spectral estimation in the past two decades , it has been 
possible to find reasonable accurate lineax models for the stationary parts of vowel spectra 
However these , models ( including Fourier transform with windowing , STFT and LPC 
) fails in terms of their transient-detection capability and scalabihty The result is that 
it is not possible to extend their use to the characterization of stop consonants 
Stop consonants are acoustically characterized by rapid changes m short-time energy spec- 
trum preceded or followed by a fairly long period ( of the order of several centiseconds ) 
during which there is no energy in all bands about the voicing component ( i e above 300 
Hz ) The rapid opening of the oral cavity during the articulation of stops produces a 
wave-burst with a high-frequency spectral content Further , when a stop is adjacent to a 
vowel , the movement of the oral cavity to/and or from the closure results in rapid changes 
in the formant frequencies , known as transitions Since all these events occur within a 
fraction of the time taken for vowel articulation , a large or flexible window size will lead 
to the loss of related acoustic cues 

From the psysiological point of view , six stop consonants /k,p,t,b,d,g/ , differ from one 
another in the phonetic features , the place of articulation and the phonetic feature voicing 
The former divides the six stop consonants into labial /p,b/ ( closure of the vocal tract 
at lips ) , aleveolars /t,d/ (closure occurs behind the teeth ) and velars /k,g/ (closure 
occurs at the velam ) The feature voicing divides the consonants into voiced /b,d,g/ 
and voiceless /p,k,t/ consonants Physiologically voicing is related to the timing between 
the larynx’s muscle activity and the vocal tract articulation In non-whispered speech , 
voicing is represented by so called voice onset time (VOT) or phonation onset parameter 
defined to be the time interval between the release of the burst and onset of vocal cords 
pulsing (i.e onset of phonation )[9] 

There is a greater statistically variability associated with VOT , depending upon , for 
example , the phonetic environment (e g a stop consonant may be located between two 
vowels or it may be word-initial or syllable final consonant ) Furthermore , the voiced 
stop consonants /d,b,g/ may be either prevoiced (negative VOT or “voicing lead ”) , in 
the sense that the vocal cords pulsing begins 5 to 15 ms after the burst release The 
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voiceless stop consonants /p,k,t/ have long VOT (vocal cords pulsing begins 20 to 40 ms 
after the burst release ) The existence of invariant acoustic features for place of articula- 
tions is easier to understand than that for phonetic feature voicing Consideration of the 
vocal tract articulatory activity indicate that such features for place of articulation should 
be located in the sense of milliseconds in the vicinity of the burst release During this 
time interval , the articulations have not yet moved into the target positions for the next 
speech segment and consequently are less likely to be affected by the transition from one 
articulatory position to another 

However the wavelet transform , a relatively recent developement m signal processing may 
provide a solution to some of these problems Frozen at an instant of time , the wavelet 
coefficients represent the instantaneous time vector output of a constant-Q filter bank ; 
for a particular filter , the output over time is a band-pass filtered , decimated time series 
One of the important characteristics of the wavelet transform is the inverse variation 
of the analysis windows with the centre frequencies of the respective band-pass filters 
This may serve as an advantage in analyzing short-time transition phenomena like stop 
consonants The fact that the outputs for a particular scale form a time series may allow 
the methodologies that make use of time domain features like phase characteristic or time 
synchronization information On the other hand , an easy extension allows the interpre- 
tation of wavelets as as filter-bank coefficients At the same time , it might be noted 
, that appropriate windows will automatically be chosen for analysis of lower-frequency 
formants of accompanying vowels , thereby benefitting from the flexibility of the approach 
The scalability of model derives its character from its interpretation as the projection 
onto a series of nested functional subspaces at varying resolutions , thus explaining the use 
of wavelets for multiresolution analysis and signal compression 

The implementation of a discrete wavelet transform involves the inner product with the 
scaled and shifted version of a single “mother ” wavelet , the mother wavelet therefore cru- 
cially defines the wavelet basis For the analysis of the speech , thus one major question 
to be solved is to find a mother wavelet that will allow the the efl&cient representation of 
speech signals For a specific choice of wavelet basis , the next question to be solved is 
how the wavelet coefficients correlate to stop consonants in general , and the respective 
position and manners of articulation ( which uniquely characterize stop consonants ) m 
particular With these questions m mind three methods of analysis are explored m the 
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present thesis 


1.2 Organization of the Thesis 

Wavelet Transform theory is clesciibod in Chapter 2 Chapter 3 desciibcs the methods 
of analysis In this chapter three methods of analysis have been explored Results and 
Discussions are described in Chaptei 4 Finally the Chaptei 5 deals with the conclusion 
and future scope of work Appendix A desciibes the projection operator on the affine space 

r 



Chapter 2 

Wavelet Transform 


2.1 Introduction 

For one dimensional function f{x), the fourier transform F(^) is another representation 
of the same function specified in frequency domain The fourier transform (and sub- 
sequently the inverse fourier transform ) is orthogonal and all information in the original 
signal IS preserved in the frequency domain We can write the inverse expression in the form 

/ +00 

( 2 1 ) 

-oo 


Where F{^) is fourier transform of f{x) given by 

F{0 = r" fix)e^^^^^dx 

J — OO 


(2 2 ) 


This indicates that the function f{x) is expressed m terms of its projection on the basis 
function These functions are perfectly localized in frequency domain , but have in- 

finite extent in time domain Hence any time localized information in f(x) such as abrupt 
changes is spread in whole spectrum Hence fourier transform lacks the knowledge about 
the time localized features of the signal To overcome this problem Gabor introduced the 
window fourier transform also known as Short Time Fourier Trans form(ST FT)" The 
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STFT of a signal f{t) is defined as 

STFTfir, 0 = (2 3) 

where h[t) is a low pass window like gaussian window, which is localized in frequency do- 
main as well as in time domain The parameter ^ specifies the conventional frequency and 
T specifies the time shift Thus the transform STFTfir, has significant magnitude only 
if the signal / (t) has significant spectral component of frequency ^ in the neighbourhood 
of the time instant t = t It is shown that to get fine frequency resolution the window 
should be as wide as possible , and the condition of fine time resolution is that the window 
width should be as small as possible These conflicting requirements cannot be satisfied 
simultaneously and uncertainty principle gives the lower bounds on the product of time 
scale resolution and frequency scale resolution [1] The STFT uses a window of fixed 
support , hence of fixed bandwidth This type of analysis is called uniform filter bank 
analysis [3] 

In this regard the Wavelet transform(WT) offers an advantage over the other time frequency 
distributions It allows one to have fine time scale resolution and vice versa Along with 
wavelets comes the notion of scale The continuous Wavelet transform of f{x) is defined as 

1 f+OO rr* — -T- 

CWT,{a,r) = -^ )dx (2 4) 

y/CL J-oo Cl 

where the window function i/j{x) is the basic or mother wavelet and is a bandpass function 
'0(2^)/\/a is the wavelet basis function sometimes called as baby wavelet By a change 
of varaible ax = x , the above equation becomes 

r- Hoo 'T 

CWTf{ajT) = \/a / f{ax)'ijj(x )dx (2 5) 

J—00 d 

showing the equivalence between scaling il){x) in (2 4) or scaling f{x) in (2 5) to obtain 
the Wavelet transform In Wavelet transform , the analyzing window is scaled (i e either 
dilated or contracted ) and the shifting parameter r is made a function of the scale param- 
eter a . Thus the narrow windows are shifted by small interval and wider windows shifted 
by large intervals This results in “ constant Q ” filter bank analysis [3] The Wavelet 
transform given by eqn (2 4) and (2 5) is invertible provided the mother wavelet satisfies 
the admissibility condition [1] given as 
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where 



Ml!! 


< oo 


/ -foo 

7p{x)e-^^^^^dx 

-oo 


(2 6 ) 


(2 7) 


IS the Fourier transform of ^^{x) 

When rp{x) is having fast decay at infinity [1] , the admissibility condition can also be 
written as 



•4f{x)dx 


= 0 


(2 8 ) 


This indicates that '^(t) is oscillatory 


2.2 Discrete Wavelet Transform 

In both the eqns (2 4) and (2 5) , (a, r) are continuous and there is redundancy in CWT 
representation of f{x) For practical computation {a, r) should take only a finite number 
of values (a, r) are thus discretised on finite grid and given by 


a = Gg , T = nroGg 


(2 9) 


where m,n are integers 

The discrete wavelet transform of /(i) is given by 

/ ^-OO 

J{t)^mn{t)dt (2 10 ) 

-oo 


where 


m 

Ipmait) = do " - dTo) 


( 2 . 11 ) 
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scale m=lo 




a 


3 f 


1 # 


-4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 

time shiftr = 2”*n 


Figure 2 1: The Dyadic Sampling Grid 


0o,o(i) = fp{i) 

and ao and tq are constants that determine the sampling intervals 

A practical sampling scheme is a = 2"*, t = n2^ i e qq = 2 and tq = 1 so that eqn (2 11) 
becomes 

Kn{t) = 2"'? ^{2-^t - n) (2 12) 


With this octave time and dyadic translation , the sampled values of (o, r ) are shown in 
dyadic grid of Fig 2 1 Since the fourier transform of tpiat)/ y/ais'^ (w/ a)/ a^/a , the centre 
frequency and bandwidth of a wavelet are both scaled by 1/a for a time scaling of a Thus 
the Q of all baby Wavelets 


Q = 


Centre frequency 
bandwidth 


= constant 


(2 13) 


giving rise to so called constant Q analysis capability of Wavelets The frequency resolu- 
tion decreases with increase in centre frequencies 
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2.3 Multiresolution Analysis (MRA) 

The multiresolution analysis are widely used in computer vision for the purpose of pattern 
recognition In this section we will consider in brief , the relation between multiresolution 
analysis and wavelet transform A more detailed description can be found in [2] 

Let L‘^{R) be the Mbert Space of measurable , square -integrable 1-D functions f(x) The 
multiresolution analysis consists of breaking up L^{R) into ladder of spaces such that 

C 1^2 C C 1^0 C Vli C F_2 (2 14) 

with the properties that if f{x) e Vj then /(rr - 2'^) eVj,keZ and f{2x) € Let 

W] be the orthogonal component of Vj m Vj-i This is written as 

^.-1 = ^.0^^; (215) 

The multiresolution analysis has the following other properties 

1 

fix) f{2^x) € Vo (2 16) 

2 There exists a scaling function <^(x) such that {(f>o,nix)}nez is orthogonal basis of Vq 
where (t>o,ni^) = <f{x — n) This also implies that = 2~i([)[2~^x — n)nsz} form 
the basis of Vj 

3 As explained above , if Wj is the orthogonal component of Vj in Vj-i , then there 
exists a function ip{x) called as wavelet such that {'ipj,nix)} is orthogonal basis of Wj 
where 


(2 17) 


iP,4x)= 2-^(2-^ x-n) 
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4 Let Aj denote the orthogonal projection operator onVj i e the projection of f{x) on 
Vj by Ajf{x){also called as approximate signal at level j) Then Ajf(x) is completely 
characterized by the sequence {oij^n} where 

o:j,n =< fix), (f>],nix) > (2 18) 

5 The approximation at a resolution level j is a coarser approximation as compared to 

the approximation at resolution level j — 1 The resulting loss of information is given 
by orthogonal projection of /(x) on Wj and is called the detail signal at resolution 
level J Let Dj denote the orthogonal projection operator on the subspace Wj 
Then Djf{x) is completely characterized by the sequence where 


dj,n =< fix), ipj,ni^) > (2 19) 

6 The function (fj,nix) and ^j.re(x) satisfies the following relations 

< ^j,fc(i)>j,TO(x) >= 6k-m. (2 20) 

'^j,kix), 4^t,mix) (2 21 ) 

< ^j,kix),'^j,nix) >= 4-n (2 22) 


From the above discussion it is clear that {^;,n}j. 7 i€X is orthogonal basis of I/^(i?) Mallat 
has shown[2] how to obtain the sequence and given the sequence {o-j-i.n} and 

vice versa Fig 2 2 shows the scheme for obtaining and {dj} from {oj-i} and Fig 
2 3 shows the reverse scheme for obtaining {oj-i} from {oj} and {dj} 

The above two figures 2 2 and 2 3 are same as analysis and synthesis filter bank of 2 
channel quadrature mirror filter (QMF) For perfect reconstruction G and H filters should 
satisfy the following conditions 


\m\ = 1 


(2.23) 
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lJT(u;)l2 + |ff(w + 7r)|2 = l (2 24) 


H{Z)G{Z-^) + H{-Z)G{-Z-^) = 0 (2 25) 

where 

H{Z) = '£h{n)z-^ (2 26) 

nSz 

G(Z) = '£g{n)z-'' (2 27) 

n£z 

(2 28) 


H{uj) = H{Z)\z=,, 
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G{u!) = G{Z)\z=e]>^ (2 29) 

H{^ = (2 30) 

= Gi-Z-'^) (2 31) 

The concept of multiresolution can be extended to 2-D signals [2] 

2.4 Scaling Function and Wavelet Obtained from It- 
erated Filters 

The scaling function 4>{t) and wavelet i/>(t) can be represented in terms of low pass and 
high pass filters [1] They are represented by 

<^(i) = >/2 ^ h(l)<:){2t - 1) (2 32) 

i 

and 

i 

where g and h are low and high pass filters respectively , explained in section 2 3 
Let p be the filter length Then the above eqns 2 32 and 2 33 can be written as 

p-i 

(l>{t) = V2j2h{l)4>{2t-l) 

l=Q 

>/>(«) = 9 (2 35) 

1=0 


(2 33) 


(2.34) 
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D-4 scale 



(j){t) IS zero outside t = 0 and t = p - I [l] and since it is a continuous function , it must 
be zero at i = 0 and t = p — 1 Hence for an integer j , is nonzero only for y > 1 

and j <p — 2 Eqns (2 34) and (2 35) constitute p — 2 linear equations Which can be 
solved iteratively to obtain (j)(t) and ip{t) 

The scaling function and wavelet for different filter lengths of Daubechies filter [5] are 
shown in figures 2 4 to 2 7 The filter coefficients are given in [6] Figs 2 4 and 2 5 shows 
the scaling function and wavelet of filter length 4 Figs 2 6 and 2 7 shows the scaling 
function and wavelet of filter length 10 

It is seen that wavelet and scaling function , in general are nonsymmetric This results 
in nonlinear phase filters in analysis and synthesis filter banks By adopting bi-orthogonal 
filters , the condition of linear phase can be satisfied However this is at the cost of using 
separate scaling function and wavelet in analysis and synthesis filter bank For more details 
refer[14] 
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D-4 Wave 



Figure 2 5 Wavelet for Daub-4 filter 



Figure 2 6 Scaling Function for Daub- 10 filter 
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Figure 2 7 Wavelet for Daub-10 filter 


2.5 Fast Wavelet Transform 

In paractice the signal f{x) is sampled with finite sampling rate , to obtain the sequence 
{'ii(n)},u(u) = f{nT) These samples are supposed to be orthogonal projection of /(x) 
on Vo 1 e 0-0, n = u{n) The wavelet coefficients dj,*, I < j < L can be obtained using the 
pyramidal algorithm as shown in Fig 2 8 Here L represents number of levels { L < N , 
where 2^ is the signal length ) 

The signal can be perfectly reconstructed from its wavelet coefficients The reconstruction 
scheme is shown in Fig 2 9 

both Figs 2 8 and 2 9 results in fast wavelet transform 
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Figure 2 8 Pyramidal algorithm for Analysis 



Figure 2 9 Pyramidal algorithm for Synthesis 















Chapter 3 


Methods of Analysis 


Three difFrent methods have been explored for analysis purposes They are described in 
brief in the following sections 


3.1 Analysis Through Classification 

3.1.1 Introduction 

Classification is attempted through a sequence of stored parameters that have previously 
been obtained through learning process Typically this process may be divided into two 
stages The first one is the feature extraction stage wherein short time temporal or spectral 
parameters of speech are extracted The second one is the classification stage wherein the 
derived parameters are compared with the stored reference parameters and decisions are 
made based on some kind of minimum distance rule We have used Discrete Wavelet 
TVansform as the feature extraction tool Theory of wavelet transform has been explained 
m Chapter 2 With the wavelet transform we have used k-means[13] algorithm for finding 
the referance parameter 
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3.1.2 Classification Method 

k-mean& algorithms comprise avciy powc'ilul class of clustoiing and scgmoiitatioii methods[17 
Variants of the basic algorithm have found use in a large variety of applications including 
compression and representation of speech data A prime example is the vector qiiantiza- 
tion(VQ) of speech signals 

The training data are segmented into N clusters, each of which is represented by a ref- 
erence vector These reference vectors aie rhosen such that the average of the distances 
between each element of the training data and its nearest lefeience vector is miniiiiized 
Let yi,y 2 i ,ym ^ -R’” comprise of the training data set that we wish to segment into k 
clusters, where 2 < K <m we minimize 

^ m 

£{w, = (3 1) 

2=1 J = 1 

subject to 

k 

= l y = l,2,3, ,m 

t=i 

lOt j = 0 or 1 1=1,2, ,k,j = 1,2, ,m 

-R=[’'i,r2, 

W = [wjj] IS k X m matrix that dehnes the class membership = 1 indicates that 
r/j 6 cluster i, or more accurately, the cluster represented by r^ if 

d{yJ,r^) < d{yj,ri) 

d( , ) IS a distance measure and defined as d{yj,ri) = {yj — — r^) R is the 

collection of the cluster centroids or reference vectors The problem in (3 1) is to determine 
the cluster memberships and cluster prototypes such that distortion error e, is minimized 
Classification is performed by assigning the test token to the class of the nearest prototype 
For this reason, it is also called as a 1-nearest neighbour(l-NN) classifier 
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3.2 Wavelet ogram 

As explained in the previous chapter , dj^n represents the wavelet coefficients and at 
level L , dL,n is the detail signal and is the approximate signal For simplicity 
we set dL+i,n = <^L,n and denote the wavelet coefficients by I ^ J L + I ,but 

n = 0 , 1 , 2, • , 2^~^forj ^ L + 1 and n = 0, 1 , 2, , 2^~^forj = L + 1 The two dimen- 

sional array d has a triangular structure 

The plot of {|dj,nl} m the {j, n) plane corresponds to the spectrogram and is called Wavele- 
togram [8] The usefullness of waveletogram lies in the fact that jdj.nP measures the energy 
of the signal at scale 2^ and location 2^~^n For orthogonal basis E — is pre- 
cisely the energy E = l/(t)P of the signal , where 2^ is the length of the signal 

Since the 2D array is triangular m nature , hence it is difficult to plot To overcome 
this difficulty we define a matrix of size (L + 1) x (2^~^) or (level-l-l) x (half the length of 
original signal) The elements of E-matrix are given by 


Eo,2^-^k+n — \dL+l,k 


(3 2) 


fc = 0,l,2, ,2^-'^-l,n = 0,l,2, -,2^-1 

for ^ = T 4- 1 


Ei+i-j,2J-^k+n = \dj,k\ (3 3 ) 

A; = 0,1,2, ,2^-J -l,n = 0,1,2, ,2^-^ - 1/or ; = L,L - 1, 1 

The coefficients of E-matrix are mapped into 0 to 255 gray scale The resulting matrix 

plot is waveletogram which is used in our analysis 
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3.3 Reconstruction from Multiscale edges 

3.3.1 Introduction 

Points of sharp variations are often among the most important features for analyzing the 
properties of transient signals and images In Images , they are generally located at 
the boundaries of important image structures Signal sharp variation produces modulus 
maxima at different scales 2^ of its wavelet transform Mallat [4] has shown that the 
almost exact reproduction of the signal is possible from the modulus maxima of its wavelet 
transform That means a signal can be reconstructed back from the maxima information 
and place of their occurance at the dyadic scale 2^ Mallat’s algorithm for reconstruction 
of 1-D signal is described in the next section It can also be extended to 2-D image [4] 

3.3.2 Reconstruction Algorithm 

Let f{x) £ L‘^{R) and {W 2 jf{x))jez be its dyadic wavelet transform We describe an 
algorithm that reconstructs an approximation of iW 2 }f(x))j^z , given the positions of 
the local maxima of \W 2 jf{x)\ and the values of W 23 f{x) at these locations For this 
purpose we characterize the set of functions h[x) such that at each scale 2^ , the modulus 
maxima of W 2 jh{x) are the same as the modulus maxima of W-pfix) Let {xi)nez > be 
the abscissa where \Wpf(x)\ is locally maximum The maximum constraints on Wph{x) 
can be decomposed in two conditions 

1 At each scale 2^ , for each local maxima located at , Wph{xi) = Wpf{xD 

2 At each scale , the local maxima of |W 2 j/i(x)| are located at the abscissa {xji)nez 
Condition (1) is equivalent to 

< f{x), {xh - x) >=< Hu), (4 - > (3 4) 

In general the function h{x) that satisfies eqn(3 4) does not characterize f(x) uniquely[4] 
Condition (2) is more difficult to analyze because it is not convex In order to solve this 
problem numerically , we approximate condition 2 with a convex constraint Condition 2 
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defines the value of the \va\elet transform at the point (3;^)(j,n)62^ Instead of imposing 
that local maxima of W23h{x) are located at these points , we impose that \W2jh{x)^ be 
as small as possible on the average This generally creates local modulus maxima at the 
position The number of modulus maxima of depends on how much 

this function oscillates To have as few modulus maxima as possible outside the abscissa 
) we also minimize the derivative of W2ih{x) Since these conditions must be 
imposed at all scales 2 -’ , we minimize 


|WP=|W^hWI= f (ll»'2,A|P+2=^||^in (35) 

j=~oo 

Let us now describe an algorithm for our minimization problem Instead of comput- 
ing the solution itself , we reconstruct its wavelet transform with an algorithm based on 
alternate projections Let K be the space of all sequences of functions (gj{x)jQz) such that 

l(fc(^)WP= E (IISjlP + 2"’il%P)<+0° (36) 

J=~oo 

The norm || || defines a Hilbert structure over K Let V be the space of all dyadic wavelet 
transforms of functions m L‘^{R) It can be proved that V is included in K Let F be 
the affine space of sequences of functions ^ ^ such that for any index j and all 

maxima positions 


aW) = Wxflx!,) 


(3 7) 


One can prove that F is closed in K The dyadic wavelet transform that satisfy condition 
( 1 ) are sequences of functions that belong to 

A = KnF (38) 


We must therefore find an element of A whose norm H t| is minimum This is done by 

alternating projections on V and A 

Dyadic wavelet transforms is invariant under operator[ 4 ] 
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Pv = Wo 


(3 9) 


(where W~^ is the inverse wavelet transform) 

For any sequence X = ^ X, it is clear that PyX E V, theretoie, Py is a pio- 

jector on V Py is self-adjoint and orthogonal for a kind of wavelet (explained m chapter 
2) The orthogonal projection on the sjiacc V is thus imiik'nionted by applying the opeia- 
tor W~^ followed by the operator W Pa is the projection operator on affine set A which 
IS orthogonal with respect to the noim || || Mallat [4] has proved that the projection 
operator Pa implemented by adding piecewise exponential curves to each function 

of the secjuence that we project on Pa Appendix A chaiatteiizcs the projection on the 
affine set F, which is oithogonal with respect to the noiin j] 1| Let P = Pv o Pa be the 
alternate projections on both spaces Let P^”) be n iterations over the operator P Since A 
IS affine space and V a Hilbert space, a classical result on alternate projections [11] provc^s 
that for any sequence of functions X = {gj{x))j^z € K 

hni P”A: = PrA' (3 10) 

Alternate projections on A and V converge strongly to the orthogonal projection on P 
If X is the zero element of K which means that 9 j{X) = 0 for all y E Z, the alternate 
piojections converge to the element of P , which is closest to zeio, and thus whose noim 
II II IS minimum Figure 3 1 illustrates how the appioximatioii of the wavelet tiansfoim 
of f{x) IS reconstructed by alternating orthogonal projections on an affine set F and on 
space V of all dyadic wavelet transfoims The projections begin horn the zero element and 
converge to its orthogonal projection on F n V 
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Figure 3 1 Alternate projection on space V and space F 


3.3.3 Numerical Reconstruction of 1-D Signals from local Max- 
ima 

Prom the results of Meyers [12] , we know that in general , we cannot reconstruct exactly 
a function from the modulus maxima of its wavelet transform Mallat’s algorithm ap- 
proximates this inverse problem by replacing the maxima constraint by the minimization 
of a noim that yields a unique solution We thus do not converge toward the wavelet 
transform of the original signal but toward some other wavelet transform that we hope to 
be close to the original one The computation of solution might be unstable , in which 
case the alternate projections converge very slowly Hence more number of iterations are 
required for the solution close to the original signal 



Chapter 4 


Results and Discussions 


4.1 Data Set and the Experiment 

In order to study the features of stop consonants , data corresponding to three voiceless 
stop consonant /p, k, t/ were collected Data from a single male speaker as used as speech 
data base They consists of 19 /k/’s, 21 /t/’s, and 19 /p/’s, 59 in all These were next 
passed through a burst detector (a simple energy thresholding ) and the first 100 ms after 
the burst were retained for further processing At 8 KHz sampling rate , they produce 
800-long data records The wavelet transfoim of each data was then computed for number 
ol levels L = 0 Five repetitions of each utterance were chosen at random for training set 
Ten templates per levels were obtained for each utterance using k-means algorithm In our 
case we have 50 templates for detail signal and 10 templates for the coarser signal at level 5 
Classification is done through template matching ( wavelet coefficients are compared with 
the stored templates level wise ) minimum distance rule is used as method of classification 
The group of templates which produce minimum distortion is classifified to that group 
. Correct classification scores are shown in table 41 It is seen in the table that poorest 
classification is for /p/ and the overall classification is 83% (49/59) Higher scores are 

expected if learning vector quantization(l}/Cl) [16] as used instead of vector quantization 
Vector quantization minimizes the average distortion error and does not approximate the 
optimum decision boundaries between the different classes This drawback is taken care 
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N 

/P/ 

N 

Total 

89 5% 

73 7% 

85 7% 

83% 

17/19 

14/19 

18/21 

49/59 


Table 4 1 Cl«issifieation Scoio 


of m LVQ 


4.2 Feature Extraction from Waveletogram 

Six stop consonants /k, p, t, b, d, g/ were sjiokcn by a single male speakei in VCV syllables 
The waveletogram along with the time plot for each stop consonant is shown in Figs 4 1 
to 4 6 We have tiied to (establish one to one (oiiesiiondeme between waveletogiani and 
spectrogram, but did not succeed We can see in the hguie that they aie { waveletogiani 
) certainly different from one anothei In fact we were looking for the explicit time infor- 
mation as we get in spcctrogiarn We tiied the modified wavoletograaveiaged visualization 
such as thresholding, smoothing, averaging etc The averaged waveletogram of each stop 
consonant is shown in Figs 4 7 to 4 12 In this trial also we could not get the explicit 
time information Observing the modified waveletogram of /k, g/, /p, b/ and /t, d/ , we 
find some similarity between each pair This similarity is indicative of the same place of 
articulation of each pair 
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4.3 Simulation Results of Reconstruction Algorithm 

The reconstruction algorithm of section 3 3 2 was tested on the data shown in Fig 4 13 
The above data is the part of speech signal and is 64-point long Wavelet transform 
was applied on this data and extreme points (maxima and minima) and their places of 
occurance were calculated for each level The reconstruction algorithm was then applied 
on the resulting data and reconstructed signal are shown in Figs 4 14 and 4 15 Fig 4.14 
IS the plot of the signal obtained after 20 iterations The SNR is 25 dB/decade The 
signal obtained after 50 iterations is given in fig 4 15 looking at the Fig 4 15 and the 
original signal , one can hardly make any difference For easy comparison they are plotted 
on the same graph shown in Fig 4 16 

The reconstruction algorithm was also applied to stop consonant /k/ with time plot 
given in Fig 4 17 Since the length of the speech data is very large (8000-long , recorded 
for 1 sec at sampling frequency 8 KHz) We apply the reconstruction algorithm frame 
wise We have chosen 256 as the frame length Figs 4 18 and 4 19 are the plots of 
reconstructed signal after 20 and 50 iteration’s respectively 

The reconstructed signals are not equal to the original signal but are numerically very 
close They have no spurious oscillations and have same types of sharp variations seen in 
Figs 4 17 and 4 19 Qualitatively , the reconstructed signals are thus very similar to the 
original one Error’s are hardly noticeable by comparing the graphs shown in Fig 4 17 
We have no upper bound on the error due to the distance between the signal to which 
we converge and the original signal This is an open mathematical problem , but the 
numerical precision of this reconstruction algorithm is sufficient for many signal processing 
applications like efficiency of representation (data compression ) , perceptual relevance , 


and robustness to noise etc 
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Figure 4 7 Modified Waveletogram of K 
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Figure 4 8 Modified Waveletogram of P 
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Figure 4 9 Modified Waveletogram of T 
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Figure 4 14 Signal Reconstructed aftei 20 Relations 






Figure 4 15 Signal Reconstructed aftei 50 Iterations 



E'iguje~4.l6'’i@0xaparisop of Original Signal and Reconstructed signal after 50 Iterations 
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Figure 4 17 Time jilot of /k/ 




Figure 4.18r Signal Reconstiucted all 



figure 4 104 ^nal Reconstructed aftei 50 iterations 



Chapter 5 


Conclusion and Suggestion for 
Future Work 


Wavelet Transform has become a powerful tool m the area of signal processing The present 
thesis is an attempt to utilize this tool lor analysis and chaiacterization of stop consonants 
/k, p, t, b, d, g/ Three methods of analysis have been explored 

Analysis through classification is based on k-means clustering method In this method 100 
ms of speech signal was analyzed after the burst Wavelet transform was then applied on 
the resulting signal Five lepotitions of each voiceless stop consoiiant /k, p, t/ were used in 
training data set and 10 parameters pei levels were obtained through clustering method 
Five levels of wavelet transform were chosen Hence 50 parameters for detail signal and 
10 parameters for coarser signal constitute 60 parameters per stop consonants Each stop 
consonant was then classified m terms of these stored parameters and 83% correct classi- 
fication was achieved Learning Vector Quantization (LVQ) would have produced better 
results because the boundaries between different classes are optimized in this method in- 
stiiad of avciage distoitiou 

Our effort to extract explicit time information like voice onset time (VOT), place of occur- 
rence of burst etc was not successful In fact we tried to establish one-to-one correspon- 
dence between spectrogram and waveletogram, but failed to do so Frequency division on 
dyadic scale ( m waveletogram ) was the likely reason of this failure. Hence no explicit 
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time information could be obtained 

Mallat’s algorithm for reconstruction from modulus maxima of its wavelet transform is 
successfully implemented This algoiithm shows that a signal <au be well lepieseuted m 
terms of modulus maxima of its wavelet transform The reconstruction algorithm recovers 
a close approximation of sharp variation points The erroi is below our perception The 
algorithm was applied on stop consonant /k/ and a close replica was obtained as described 
in the previous chapter This suggests that stop consonant can possibly be described in 
terms of the modulus maxima of their wavelet transform at different levels or scales and 
their positions However, since the speech signal is highly oscillatory ( see Fig 4 17), and 
in the wavelet transform domain, there aic large numboi of local maxima, further data 
reduction { eg through thresholding ) is needed before this characterization can be ex- 
ploited This needs further investigation 

In the present work, we have used Daubechies-10 wavelet on a dyadic grid for ease of 
computation As a hindsight, it is felt that another choice of wavelet ( eg Butterworth[18] 
) and other possibilities ( eg with scaling, parameter 1 < o < 2 ) would have been more 
appropriate 

Adaptive fiequency tiling [19] and computation complexity aio other two important factors 
in the analysis Adaptive tiling is very iinpoitant especially for waveletogram Computa- 
tion complexity is a very important factor m real time application 
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Appendix A 

Projection Operator on P 


In this appendix, we characteri 2 e the orthogonal projection on P in one dimensional space 
and also explained how to suppress oscillations The opeiatoi Pr transforms any sequence 
{gji^))jez £ K into the closest sequence {hj{x))j^z € P with lespect to the norm || || Let 
epsilorijix) = hj[x) — flj(x) Each lunction hj{i) is chosen so that 

E + ( 61 ) 

J =-00 

IS minimum To minimize this sum, we separately each component 

+ (6 2 ) 

Let xq and x\ be the abscissa of two (onseciitive modulus imixinia of Wf^i^) Since 
(/ij(a))jeK € P, we have 


(6.3) 

ij {xi) = Wijfixi) - Qj (rci) (6 4) 


Between the abscissa Xq and xi, the minimization of the (6 1) is equivalent to the mini- 
mization of 




( 65 ) 


The Eular equation associated with this minimization is 


for X € [a- 0 )a:i] The constraints (6 3)&: (G 4) are the boarder conditions of this membrane 
equation The solution is 

e,(T) = ^ 

wheie the constiauits a and aie adjusted to satisfy the constraints equations (6 3) & 

(6 4) 

WE know that the modulus maxima of the original wavelet transform are only located at 
the positions We can thus also impose sign constraints in order to suppress any spun- i 
ous oscillation in the reconstructed wavelet transform This is done by imposing that the 
solution belongs to ajipiopriatc convex set y Let stgn(x) be the sign of the real number x 
Let Y be the set of seciuenccs igj{'x))j£z € K such that for any pair of consecutive maxima 
positions (xf, , and x e K, 

> 

I 

sign{gj{x)) = szgn{xl) i/ sz^n(x^) = s^^n(^c^+l) (6.8) ‘ 

do I X ) 

= sign{xi+i - x^J %S stgn(x^„) ^ stgn(xi+i) (6.9) i 

! 

The set Y is a closed convex and {W 2 i)j£z € Y Instead of minimizing |j |1 over Fn V, we 
miiiimizo it over Y n F D Y We thus alternate projections on Y, F, and V To compute 
the orthogonal piojoetion on the convex Y,we need to solve an elastic membrane problem 
under coustiaints This can be done with an iterative algorithm that is computationally > 
intensive Instciul we miplemeiit a simpler projector Py on Y, which is not orthogonal with ; 
lespcct to the noun II H Let {gj{%))j^z ^ K md PY{gj{x))j^z = {hj{x))3^z For each 
imlex j, IS obtained by chiipiug ol the oscillations of gj{x) 


